Managing host computing devices with a host control component

ABSTRACT

Systems and methods are disclosed which facilitate the management of host computing devices through the utilization of a host computing device control component. The host computing device control component includes a state monitoring component that monitors operating states of the control component and a group of associated host computing devices. Based on a determination of a power event, the state monitoring component causes the initiation of a reboot of the grouping of host computing devices based on exchange priority information and additional reboot parameters.

BACKGROUND

Generally described, computing devices utilize a communication network,or a series of communication networks, to exchange data. Companies andorganizations operate computer networks that interconnect a number ofcomputing devices to support operations or provide services to thirdparties. The computing systems can be located in a single geographiclocation or located in multiple, distinct geographic locations (e.g.,interconnected via private or public communication networks).Specifically, data centers or data processing centers, herein generallyreferred to as a “data center,” may include a number of interconnectedcomputing systems to provide computing resources to users of the datacenter. The data centers may be private data centers operated on behalfof an organization or public data centers operated on behalf, or for thebenefit of, the general public.

To facilitate increased utilization of data center resources,virtualization technologies may allow a single physical computing deviceto host one or more instances of virtual machines that appear andoperate as independent computing devices to users of a data center. Eachsingle physical computing device can be generally referred to as a hostcomputing device. With virtualization, the single physical computingdevice can create, maintain, delete, or otherwise manage virtualmachines in a dynamic matter. In turn, users can request computerresources from a data center, including single computing devices or aconfiguration of networked computing devices, and be provided withvarying numbers of virtual machine resources.

In conjunction with the utilization of virtualization technologies, datacenters can physically organize sets of host computing devices to allowthe host computing devices to share computing device resources, such aspower or communication network connectivity. Such physical organizationcan correspond to physical racks in which the hosting computing devicesare mounted, generally referred to as racks of host computing devices.As the number of racks of host computing devices increases, serviceproviders associated with data centers have difficulty distinguishingbetween errors or faults associated with individual host computingdevices, shared resources associated with a particular rack ordistributed components utilized to manage the host computing devices.Additionally, in the event of a large scale power outage, data centerpower resources can be severely impacted the host computing devices arerebooted.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing aspects and many of the attendant advantages of thisdisclosure will become more readily appreciated as the same becomebetter understood by reference to the following detailed description,when taken in conjunction with the accompanying drawings, wherein:

FIG. 1 is a block diagram depicting an illustrative environment formanaging host computing devices including a number of host computingdevices and control components;

FIG. 2 is a block diagram illustrative of components of a controlcomponent for utilization in the environment for managing host computingdevices of FIG. 1;

FIGS. 3A and 3B are block diagrams of the host computing deviceenvironment of FIG. 1 illustrating the processing of power events anddetermination of reboot parameters;

FIGS. 4A and 4B are block diagrams of the host computing deviceenvironment of FIG. 1 illustrating the processing of power events anddetermination of reboot parameters;

FIG. 5 is a flow diagram illustrative of a control component operatingstate processing routine implemented by a state monitoring component;

FIG. 6 is a flow diagram illustrative of power event management routineimplemented by a state monitoring component; and

FIG. 7 is a flow diagram illustrative of a reboot processing subroutineimplemented by a state monitoring component.

DETAILED DESCRIPTION

Generally described, aspects of the present disclosure relate to themanagement of host computing devices. Specifically, systems and methodsare disclosed which facilitate the management of host computing devicesthrough the utilization of a host computing device control component, orcontrol component. A set of host computing devices can be organized intoa physical rack. Additionally, a host computing device control componentis associated with each physical rack. In one aspect, the host computingdevice control component is in communication with the set of hostcomputing devices to monitor performance or manage the operation ofvarious aspects of the host computing devices in the corresponding rack.In another aspect, the control component includes a separate statemonitoring component that monitors operating states of the controlcomponent and manages power events associated with one or more the hostcomputing devices. The state monitoring component includes a powersupply separate from the control component power supply, a stateprocessing component, a visual indicator interface, and a separatecommunication component to facilitate the management of the rebooting ofhost computing devices in the event of a power event.

With reference now to FIG. 1, a block diagram depicting an illustrativehost computing device environment 100 for managing host computingdevices will be described. The host computing device environment 100includes a plurality of host computing devices 102. Illustratively, thehost computing devices 102 correspond to server computing device havingone or more processors, memory, operating system and softwareapplications that can be configured for a variety of purposes.Additionally, the host computing devices 102 may be configure to hostone or more virtual machine instances. As illustrated in FIG. 1, theplurality of host computing devices 102 are organized according to aphysical placement of a set of host computing devices, such as a rack orother support structure. The organization of each set of host computingdevices 102 will be generally referred to as a rack 104.

In an illustrative embodiment, each rack 104 is associated with a hostcomputing device control component 106, which can also be referred to asa rack control component. The host computing device control component106 can manage the operation of the set of host computing devices 102,or components thereof, including, but not limited to provisioning,updating, monitoring, and modifying software associated with the hostcomputing devices. The host computing device control component 106 alsoincludes a state monitoring component for monitoring the state of theoperation of the host computing device control component 106 andproviding visual indicators corresponding to the determined state ofoperation. Illustrative components of the host computing device controlcomponent 106 will be described with regard to FIG. 2. Although the hostcomputing device control component 106 is illustrated with regard to aset of host computing devices organized according to physical racks, oneskilled in the relevant art will appreciate that a host computing devicecontrol component 106 may be associated with sets of host computingdevices organized according to different organizational criteria.

As illustrated in FIG. 1, the multiple racks 104 of host computingdevices 102 may communicate via a communication network 104, such as aprivate or public network. For example, host computing device controlcomponents 106 from each rack 104 may be able to communicate with eachother via the communication network 108, which can include a privatecommunication network specific to host computing device controlcomponents. One skilled in the relevant art will appreciate that eachrack 104 may include any number of host computing devices 102 and thatthe host computing device environment 100 can include any number ofracks 104. Still further, the racks 104 may be further organized in amanner that does not require connectivity between all the racks in thehost computing device environment.

Turning now to FIG. 2, illustrative components of a host computingdevice control component 106 in the host computing device environment100 will be described. In an illustrative embodiment, the host computingdevice control component 106 can corresponds to a wide variety ofcomputing devices including personal computing devices, laptop computingdevices, hand-held computing devices, terminal computing devices, mobiledevices (e.g., mobile phones, tablet computing devices, etc.), wirelessdevices, various electronic devices and appliances and the like. Thecontent delivery environment 100 can include any of number and variouskinds of host computing device control components 106, which may becustomized according to specific racks 104 or types of racks.

Illustratively, the client computing devices 102 may have varied localcomputing resources such as central processing units and architectures,memory, mass storage, graphics processing units, communication networkavailability and bandwidth, etc. Generally, however, each host computingdevice control component 106 may include various computing resources 202that can include one or more processing units, such as one or more CPUs.The computing resources 202 may also include system memory, which maycorrespond to any combination of volatile and/or non-volatile storagemechanisms. The system memory may store information that provides anoperating system component, various program modules, program data, orother components. The host computing device control component 106performs functions by using the processing unit(s) to executeinstructions provided by the system memory. The computing resources 202may also include one or more input devices (keyboard, mouse device,specialized selection keys, touch screen interface, stylus, etc.) andone or more output devices (displays, printers, audio output mechanisms,etc.). The computing resources 202 may also include one or more types ofremovable storage and one or more types of non-removable storage. Stillfurther, the computing resources can include hardware and softwarecomponents for establishing communications over the communicationnetwork 108, such as a wide area network or local area network, or viaan internal communication network connecting the set of host computingdevices 102. For example, the host computing device control component106 may be equipped with networking equipment and software applicationsthat facilitate communications via the Internet or an intranet.

However, although various computing resources 202 have been identified,ones skilled in the relevant art will appreciate that variouscombinations of computing resources may be implemented on a hostcomputing device control component 106 and that one or more of theidentified computing resources may be optional.

As illustrated in FIG. 2, the host computing device control component106 can include, among other hardware or software components, amanagement component 204 for facilitating management of the set of hostcomputing devices 102. As discussed above, the management component 204can facilitate various interactions with one or more of the set of hostcomputing devices 102 including, but not limited to, provisioning,updating, monitoring, and modifying software associated with the hostcomputing devices. Although the management component 204 is illustratedas a single component, one skilled in the relevant art will appreciatethat the management component 204 may be made up of a number ofcomponents or subcomponents to carry out one or more managementfunctions associated with the host computing device control component106.

The host computing device control component 106 can also include a statemonitoring component 204 for monitoring the state of the operation ofthe host computing device control component 106 and providing visualindicators corresponding to the determined state of operation. The statemonitoring component 204 can include various components, orsubcomponents for monitoring or processing the state of operation of thestate monitoring component for monitoring the state of the operation ofthe host computing device control component 106 and providing visualindicators corresponding to the determined state of operation of thehost computing device control component 106. In one aspect, the statemonitoring component 204 includes a power supply 208 for providing powerto one or more components of the state monitoring component 204.Illustratively, the power supply 208 is independent of any power supplyassociated with the host computing device control component 106 suchthat a loss of power by the host computing device control component 106does not result in a loss of power to the state monitoring component204. For example, the power supply 208 may correspond to a battery orother capacitive device. The state monitoring component 204 can alsoinclude a state processing component 210 for determining an operatingstate of the rack control component based on one or more inputs providedto the state monitoring component 204 or based on a failure to receiveinputs. The state monitoring component 204 can also determine variousreboot parameters in the event of detection of a power event (e.g., apower outage or low power event) and initiate a reboot based on thereboot parameters.

The state monitoring component 204 can also include visual indicatorinterface component 212 for causing the generation of visual indicators,or other indicators, based on various determined operating states of thehost computing device control component 106. In one embodiment, thevisual indicator interface component 212 can include or be in directcommunication with hardware for making the visual indications,including, but not limited to, liquid crystal displays (“LCD”), lightemitting diodes (“LED”), sets of LCDs, sets of LEDs, multi-color LEDS,sets of multi-color LEDS and various combinations thereof. In anotherembodiment, the hardware for making the visual indications may be partof the computing resources 202 such that the visual indicator interface212 is in communication (directly or indirectly) to cause the generationof various visual indicators as will be described below.

The state monitoring component 204 can further include a communicationcomponent 214 for establishing communications with other statemonitoring components 204 or other reporting services as described inthe present application. In an illustrative embodiment, thecommunication component 214 can include various hardware and softwarecomponents utilized to exchange information via one or morecommunication paths. The communication paths can include wirelesscommunication paths (via infrared, RF, optical, terrestrial, satellitecommunication media, etc.), wired communication paths, or a combinationthereof. Although aspects of the present disclosure will be describedwith regard to an illustrative communication device environment andcomponent interactions, communication protocols, flow diagrams andinterfaces, one skilled in the relevant art will appreciate that thedisclosed embodiments are illustrative in nature and should not beconstrued as limiting.

With reference now to FIGS. 3A and 3B various interactions between thecomponents of the host computing device environment 100 will bedescribed. For purposes of the illustrative embodiment, however, many ofthe components of the host computing device environment 100 have beenomitted. Accordingly, one skilled in the relevant art will appreciatethat additional or alternative components may be utilized in conjunctionwith the illustrated embodiments. Additionally, although FIGS. 3A and 3Bwill be described with regard to an illustrative power event, oneskilled in the relevant art will appreciate that similar interaction mayalso occur with regard to other types of event information.

With reference to FIG. 3A, the state monitoring component 204 of one ormore host computing device control components 106 determines a powerevent. Illustratively, a power event can correspond to a determinationthat one or more host computing devices 102 in a corresponding rack 104no longer have power. Alternatively, a power event can correspond to adetermination that one or more aspects of the power provided to one ormore host computing devices 102 has fallen below an establishedthreshold, such as a minimum current or voltage.

In an illustrative embodiment, because the state monitoring component204 of the host computing device control component 106 includes aseparate power supply component 208, the state monitoring component 204can continue to operate even if there is a disruption to the powersupply provided to the host computing device control component 106 andset of host computing devices 102. Accordingly, the state monitoringcomponent 204 then establishes an alternative network connection via thecommunication component 214. Illustratively, the state monitoringcomponent 204 attempts to establish communication with one or more otherstate monitoring components. For example, the state monitoring component204 can attempt to construct or join a mesh network made of up otherstate monitoring components. In this example, the state monitoringcomponent 204 may utilize a short range wireless communication protocolto establish a network connection with one or more state monitoringcomponents that are within a maximum range. Because the power availablefrom the power supply 208 may be limited, the state monitoring component204 may be operating on a low power, short range communication path. Thealternative network connection can correspond to a portion of thecommunication network 108 utilized by the set of host computing devices102 or a completely separate communication network isolated from anycommunication channel accessible to the host computing devices. In analternative embodiment, the host computing device control component 106can maintain the communication channels independent of the determinationof a power event (or other event). Accordingly, in these embodiments,the host computing device control component 106 would be able to utilizethe previously established communication channels.

Turning to FIG. 3B, in this illustrative embodiment, it is assumed thatthe state monitoring component 204 can establish communication with twoother state monitoring components or that such a communication channelhas previously been established. Once the communication network isestablished, the state monitoring components exchange priorityparameters associated the operation of the set of host computingdevices. Illustratively, the priority parameters correspond toinformation that will be utilized to establish a relative priority forrebooting the set of hosts at each respective rack. In one aspect, thepriority parameters can correspond to a single set of parameters thatwill apply to an entire set of host computing devices 102 or to acollection of parameters that will apply to portions of the set of hostcomputing devices 102. The priority parameters can be based on apriority assigned based on one or more of the set of host computingdevices 102, such as based on how individual host computing devices havebeen provisioned or configured. In another aspect, the priorityparameters can be based on a priority assigned by a system administratorassociated with one or more of the host computing devices, users of thehost computing devices, or system administrator associated with the rack104 or host computing device control component 106. The set of priorityparameters may also be based on priorities associated with the hostcomputing device control component 106 or the data currently beingprocessed by the one or more host computing devices 102.

Based on the exchanged boot priorities, each state monitoring component204 can then determine reboot parameters and initiate a reboot of one ormore of the set of host computing devices in accordance with the rebootparameters. In one example, a state monitoring component 204 maydetermine that it is associated with a lower priority than other statemonitoring components and will delay initiating reboot untilconfirmation that the other state monitoring components havesuccessfully initiated and completed a reboot. In another example, astate monitoring component 204 may determine that it is associated withhigher priority than other state monitoring components and willimmediately attempt to initiate a reboot. In a further example, a statemonitoring component 204 may determine that it is associated withsubstantially similar priority information and can initiate a reboot ofone or more of the set of host computing devices. Additionally, tostagger the reboot of similarly prioritized reboots, the statemonitoring component 204 can also process the priority information andadd additional processing components, such as random, pseudo-random orfixed delays in initiating a reboot. The state monitoring component 204can also factor in additional criteria, such as performance rankings,percentage utilization, historical information, and the like infactoring reboot parameters. Still further, the state monitoringcomponent 204 can utilize thresholds that define a minimum number ofhost computing devices 102 that must determine a power event in orderfor the devices to jointly manage reboots.

With reference now to FIGS. 4A and 4B various interactions between thecomponents of the host computing device environment 100 in accordancewith an alternative embodiment will be described. For purposes of theillustrative embodiment, however, many of the components of the hostcomputing device environment 100 have been omitted. Accordingly, oneskilled in the relevant art will appreciate that additional oralternative components may be utilized in conjunction with theillustrated embodiments. With reference to FIG. 4A, the state monitoringcomponent 204 of two computing device control components 106 determinesa power event. In this illustrative embodiment, at least one statemonitoring component 204 has not determined that there is a power eventor is otherwise not effected by the power event. As previouslydescribed, a power event can correspond to a determination that one ormore host computing devices 102 in a corresponding rack 104 no longerhave power. Alternatively, a power event can correspond to adetermination that one or more aspects of the power provided to one ormore host computing devices 102 has fallen below an establishedthreshold, such as a minimum current or voltage.

In an illustrative embodiment, because the state monitoring component204 of the host computing device control component 106 includes aseparate power supply component 208, the state monitoring component 204can continue to operate even if there is a disruption to the powersupply provided to the host computing device control component 106 andset of host computing devices 102. Accordingly, the state monitoringcomponent 204 then establishes an alternative network connection via thecommunication component 214. Illustratively, the state monitoringcomponent 204 attempts to establish communication with one or more otherstate monitoring components. As previously described, the alternativenetwork connection can correspond to a portion of the communicationnetwork 108 utilized by the set of host computing devices 102 or acompletely separate communication network isolated from anycommunication channel accessible to the host computing devices. Aspreviously described, in an alternative embodiment, the host computingdevice control component 106 can maintain the communication channelsindependent of the determination of a power event (or other event).Accordingly, in these embodiments, the host computing device controlcomponent 106 would be able to utilize the previously establishedcommunication channels.

Turning to FIG. 4B, in this illustrative embodiment, it is assumed thatthe state monitoring component 204 can establish communication with atleast one other state monitoring component that has determined a powerevent. As illustrated in FIG. 4B, the host computing device controlcomponent 106 in which no power event has been detected may notnecessarily be utilized in the establishment of an alternative networkconnection. Alternatively, the host computing device control component106 not detecting the power may be utilized to facilitate communicationsbetween other host computing device control components or to determinewhether a threshold number of host computing device control componentshave determined the power event.

Once the communication network is established or the host computingdevice control component 106 can access a previously existingcommunication channel, the state monitoring components exchange priorityinformation associated the operation of the set of host computingdevices. Illustratively, the priority information corresponds toinformation that will be utilized to establish a relative priority forrebooting the set of hosts at each respective rack. In one aspect, thepriority information can correspond to information that will apply to anentire set of host computing devices 102 or to a collection ofparameters that will apply to portions of the set of host computingdevices 102. The priority information can be based on a priority basedon one or more of the set of host computing devices 102, such as basedon how individual host computing devices have been provisioned orconfigured. In another aspect, the priority information can be based ona priority associated with one or more of the host computing devices,users of the host computing devices, or the rack 104. The set ofpriority information may also be based on priorities associated with thehost computing device control component 106 or the data currently beingprocessed by the one or more host computing devices 102.

Based on the exchange boot priorities, each state monitoring component204 can then determine reboot parameters and initiate a reboot of one ormore of the set of host computing devices in accordance with the rebootparameters. In one example, a state monitoring component 204 maydetermine that it is associated with a lower priority than other statemonitoring components and will delay initiating reboot untilconfirmation that the other state monitoring components havesuccessfully initiated and completed a reboot. In another example, astate monitoring component 204 may determine that it is associated withhigher priority than other state monitoring components and willimmediately attempt to initiate a reboot. In a further example, a statemonitoring component 204 may determine that it is associated withsubstantially similar priority information and can initiate a reboot ofone or more of the set of host computing devices. Additionally, tostagger the reboot of similarly prioritized reboots, the statemonitoring component 204 can also process the priority information andadd additional processing components, such as random, pseudo-random orfixed delays in initiating a reboot. The state monitoring component 204can also factor in additional criteria, such as performance rankings,percentage utilization, historical information, and the like infactoring reboot parameters.

Referring now to FIG. 5, a flow diagram illustrative a control componentoperating state processing routine 500 will be described. For purposesof illustration, routine 500 will be described as implemented by a statemonitoring component 206. However, one skilled in the art willappreciate that routine 500 may be implemented, at least in part, byother components of the host computing device environment 100. In oneaspect, the state monitoring component 206 can maintain a defaultcondition that corresponds to a non-fault determination for the hostcomputing device control component 106. If the state monitoringcomponent 206 obtains information associated with the function of thehost computing device control component 106, the state monitoringcomponent 206 can determine whether to modify the default state based onthe information obtained from the host computing device controlcomponent 106. Additionally, if the state monitoring component 206 doesnot receive any information, it can determine whether to modify thedefault condition.

At block 502, the state monitoring component 206 sets the current stateto a non-fault state. At decision block 504, the state monitoringcomponent 206 determines whether it has received information regardingthe operation of the host computing device environment 100 hostcomputing device control component 106. Illustratively, the informationregarding the operation of the host computing device control component106 can include information regarding processor performance, operatingsystem performance, network performance or power performanceinformation. In one embodiment, the host computing device controlcomponent 106 can be configured to transmit the information to the statemonitoring component 206. In another embodiment, the state monitoringcomponent 206 can poll the host computing device control component 106to obtain the information.

If at decision block 504, the state monitoring component 206 determinesthat it has not received information regarding the operation of the hostcomputing device control component 106, the state monitoring component206 sets the current state to a fault condition at block 506. In thisembodiment, the failure to receive information from the host computingdevice control component 106 can be interpreted as a fault condition.The routine 500 then proceeds to block 510, which will be describedbelow.

Referring again to decision block 504, if the state monitoring component206 had received information regarding the operation of the hostcomputing device control component 106, the state monitoring component206 processes the information to determine whether a fault conditionexists at decision block 508. Illustratively, the processing of theinformation associated with the host computing device control component106 can correspond to a comparison of one or more thresholds thatestablish the presence of fault conditions. If at decision block 508,the processing of the information regarding the host computing devicecontrol component 106 is indicative of no faults, the routine 500returns to block 502.

If at decision block 508, the processing of the information regardingthe host computing device control component 106 is indicative of a faultcondition, the routine 500 proceeds to block 506 where the statemonitoring component 206 sets the current state to a fault state. Atblock 510, the state monitoring component 206 processes the faultcondition. Illustratively, the state monitoring component 206 cangenerate one or more visual indicators based on the determined faultcondition. In one embodiment, the state monitoring component 206 canutilize a single visual indicator for any fault condition. In anotherembodiment, the state monitoring component 206 can utilize multiplevisual indicators based on a type of fault condition. For example, thestate monitoring component 206 can associate a first color indicator fora fault condition indicative of needed repair fault state and a secondcolor indicator for a fault condition indicative of a troubleshootingfault state. In a further embodiment, the state monitoring component 206can associate a separate indicator, such as a flashing indicator, thatcan be associated with a power off condition for the host computingdevice control component 106. One skilled in the relevant art willappreciate that additional or alternative visual indicators may beimplemented. At block 512, the routine 500 terminates.

Turning now to FIG. 6, a power event management routine 600 implementedby a state monitoring component 204 for processing event informationwill be described. Although routine 600 will be described with regard tomanaging power events related to power resources provided to the set ofhost computing devices 102 or the host computing device controlcomponent 106, routine 600 may be implemented with regard to themanagement of other events related to resource usage or resourceavailability by the set of host computing devices. At block 602, thestate monitoring component 204 obtains power event information anddetermines a power event has occurred. As previously described, a powerevent can correspond to a determination that one or more host computingdevices 102 in a corresponding rack 104 no longer have power.Alternatively, a power event can correspond to a determination that oneor more aspects of the power provided to one or more host computingdevices 102 has fallen below an established threshold, such as a minimumcurrent or voltage. The power event information may be obtained by thehost computing device management component 106 based on informationprovided by one or more host computing devices 102 or based on resourcemonitoring conducted by the host computing device management component.

At block 604, the state monitoring component 204 attempts to accesscommunication with one or more other state monitoring components via thealternate network connection. In an illustrative embodiment, because apower event (or other event) has been determined, the state monitoringcomponent 204 can assume that the communication network 108 utilized bythe set of host computing devices 102 is unavailable or unreliable.Accordingly, the state monitoring component 204 utilizes thecommunication component 214 to attempt to contact other state monitoringcomponents. As previously described, the alternative network connectioncorresponds to one or more communication paths, including wirelesscommunication paths (via infrared, RF, optical, terrestrial, satellitecommunication media, etc.), wired communication paths, or a combinationthereof. Additionally, in some embodiments, the host computing devicecontrol component 106 can maintain the alternative communicationchannels independently of the determination of an event. In suchembodiments, the host computing device control component 106 can accessa previously existing communication channel.

At decision block 606, a test is conducted to determine whether thestate monitoring component 204 has established a connection with atleast one other state monitoring component or can access a previouslyestablished communication channel. If not, the routine 600 proceeds toblock 610, which will be described in greater detail below.Alternatively, if at decision block 606, the state monitoring component204 is able to establish a connection with at least one other statemonitoring component, at block 608, the state monitoring componentsexchange priority information. As previously described, the priorityinformation corresponds to information that will be utilized toestablish a relative priority for rebooting the set of hosts at eachrespective rack. In one aspect, the priority information can correspondto information that will apply to an entire set of host computingdevices 102 or to a collection of parameters that will apply to portionsof the set of host computing devices 102. The priority information canbe based on a priority based on one or more of the set of host computingdevices 102, such as based on how individual host computing devices havebeen provisioned or configured. In another aspect, the priorityinformation can be based on a priority associated with one or more ofthe host computing devices, users of the host computing devices, or therack 104. The set of priority information may also be based onpriorities associated with the host computing device control component106 or the data currently being processed by the one or more hostcomputing devices 102.

At block 610, the state monitoring component 204 determines controlparameters based on the exchanged priority information. Alternatively,no connection was determined at decision block 606, the state monitoringcomponent 204 determines reboot parameters based on the priorityinformation associated with the respective host computing devicemanagement component 106. As previously described, in one example, thecontrol parameters may correspond to reboot parameters for controllingthe reboot of multiple host computing devices 102. Accordingly, a statemonitoring component 204 may determine that it is associated with alower priority than other state monitoring components and will delayinitiating reboot until confirmation that the other state monitoringcomponents have successfully initiated and completed a reboot. Inanother example, a state monitoring component 204 may determine that itis associated with higher priority than other state monitoringcomponents and will immediately attempt to initiate a reboot. In afurther example, a state monitoring component 204 may determine that itis associated with substantially similar priority information/rankingsand can initiate a reboot of one or more of the set of host computingdevices. Additionally, to stagger the reboot of similarly prioritizedreboots, the state monitoring component 204 can also process thepriority information and add additional processing functionality, suchas incorporating random, pseudo-random or fixed delays in initiating areboot. The state monitoring component 204 can also factor in additionalcriteria, such as performance rankings, percentage utilization,historical information, and the like in factoring reboot parameters.

At block 612, the state monitoring component 204 causes the processingof the control parameters for a set of host computing devices 102. Inone embodiment, the host computing device environment 106 can utilizevarious functions and interfaces for controlling the set of hostcomputing devices 102 to initiate a reboot according to rebootparameters. In other embodiments, the control parameters can correspondto the initialization of a power management procedure. In anotherexample, the control parameters can correspond to the initialization ofa security protocol or a communication protocol. A subroutine forprocessing control parameter that cause the reboot of the set of hostcomputing devices 102 will be described with regard to FIG. 7. However,one skilled in the relevant art will appreciate that additional or 1\Atblock 614, the routine 600 terminates.

Turning now to FIG. 7, a reboot processing subroutine 700 implemented bya state monitoring component 204 for initiating a reboot of the set ofhost computing devices will be described. Illustratively, subroutine 700may correspond to block 612 of routine 600 (FIG. 6). At block 702, thestate monitoring component 204 obtains priority information for the setof host computing devices 102. At block 704, the state monitoringcomponent 204 obtains a current priority associated with reboot of otherracks 104. In an illustrative embodiment, the state monitoring component204 will attempt to exchange information with other state monitoringcomponents via the alternate communication network regarding prioritiesassociated with other racks of host computing devices 102 and whetherrespective host computing device management components 106 haveinitiated a successful reboot of their host computing devices. If notcommunication network is available or if no other state monitoringcomponent 204 is available, the current state monitoring component 204can assume it has the highest priority.

At decision block 706, a test is conducted to determine whether thecurrent priority is greater than the priority information for the set ofhost computing devices 102. If so, the state monitoring component 204then proceeds to decision block 708 to determine whether it has receivedinformation from a state monitoring component 204 associated with ahigher priority has successfully initiated a reboot of its respectiveset of host computing devices 102. In an illustrative embodiment, thehost computing device control component 106 can also determine atdecision block 706 whether a threshold number of host computing devicecontrol components have determined the event to cause the implementationof a management reboot. Additionally, in other embodiments, decisionblock 708 may be omitted. If confirmation is received, the subroutine700 returns to block 704 to obtain an updated current priority.

Alternatively, if at decision block 706, the state monitoring component204 determines that it has a higher or equal priority to the currentpriority exchanged between the state monitoring components or if atdecision block 708, no confirmation is received from state monitoringcomponents associated with a higher priority, at block 710, the statemonitoring component 204 can process any additional reboot criteria. Inone embodiment, the state monitoring component 204 can implementaddition random, pseudo-random or fixed delays in initiating a reboot.The state monitoring component 204 can also factor in additionalcriteria, such as performance rankings, percentage utilization,historical information, and the like in determining an order forinitiating a reboot of the set of host computing devices or initiating adelay in the reboot of the set off host computing devices. Stillfurther, the state monitoring component 204 can also select subsets ofthe set of host computing devices 102 to reboot based on differentpriority information associated with the set of host computing devices102. Still further, the host computing device control component 106 candetermine that no reboot is required.

Based on the processed reboot criteria, at block 712, the statemonitoring component 204 initiates a reboot of the set of host computingdevices (or subsets thereof). As previously described, the hostcomputing device environment 106 can utilize various functions andinterfaces for controlling the set of host computing devices 102 toinitiate the reboot of the set of host computing devices. At block 714,the sub-routine 700 returns.

It will be appreciated by those skilled in the art and others that allof the functions described in this disclosure may be embodied insoftware executed by one or more processors of the disclosed componentsand mobile communication devices. The software may be persistentlystored in any type of non-volatile storage.

Conditional language, such as, among others, “can,” “could,” “might,” or“may,” unless specifically stated otherwise, or otherwise understoodwithin the context as used, is generally intended to convey that certainembodiments include, while other embodiments do not include, certainfeatures, elements, and/or steps. Thus, such conditional language is notgenerally intended to imply that features, elements and/or steps are inany way required for one or more embodiments or that one or moreembodiments necessarily include logic for deciding, with or without userinput or prompting, whether these features, elements and/or steps areincluded or are to be performed in any particular embodiment.

Any process descriptions, elements, or blocks in the flow diagramsdescribed herein and/or depicted in the attached figures should beunderstood as potentially representing modules, segments, or portions ofcode which include one or more executable instructions for implementingspecific logical functions or steps in the process. Alternateimplementations are included within the scope of the embodimentsdescribed herein in which elements or functions may be deleted, executedout of order from that shown or discussed, including substantiallyconcurrently or in reverse order, depending on the functionalityinvolved, as would be understood by those skilled in the art. It willfurther be appreciated that the data and/or components described abovemay be stored on a computer-readable medium and loaded into memory ofthe computing device using a drive mechanism associated with a computerreadable storing the computer executable components such as a CD-ROM,DVD-ROM, or network interface further, the component and/or data can beincluded in a single device or distributed in any manner. Accordingly,general purpose computing devices may be configured to implement theprocesses, algorithms, and methodology of the present disclosure withthe processing and/or execution of the various data and/or componentsdescribed above.

It should be emphasized that many variations and modifications may bemade to the above-described embodiments, the elements of which are to beunderstood as being among other acceptable examples. All suchmodifications and variations are intended to be included herein withinthe scope of this disclosure and protected by the following claims.

What is claimed is:
 1. A system for managing host computing devices, thehost computing devices organized into a physical rack, the systemcomprising: a rack control component in communication with the hostcomputing devices, the rack control component including: a managementcomponent for monitoring and controlling one or more aspects of theoperation of the grouping of host computing devices; and a statemonitoring component, the state monitoring component having a powersupply independent of a power supply associated with the rack controlcomponent, a state processing component for determining power eventinformation, and a communication interface for establishingcommunication with other state monitoring components via a communicationchannel independent of a communication channel associated with thegrouping of host computing devices; wherein the state monitoringcomponent attempts exchange communications via a communication channelwith at least one other state monitoring component based on a determinedpower event, wherein the state monitoring component exchanges priorityinformation with the at least one other state monitoring component if acommunication channel is successfully established, and wherein the statemonitoring component determines reboot parameters based on thedetermined power event.
 2. The system as recited in claim 1, wherein thereboot parameters correspond to a determined reboot order based onpriority information.
 3. The system as recited in claim 1, wherein thereboot parameters correspond to a delay in initiating a reboot of thegrouping of host computing devices.
 4. The system as recited in claim 3,wherein the delay in initiating the reboot of the grouping of hostcomputing devices is based, at least in part, on at least one of arandom delay, a pseudo-random delay and a fixed delay.
 5. The system asrecited in claim 3, wherein the delay in initiating the reboot of thegrouping of host computing devices is based, at least in part, onevaluation of additional criteria.
 6. The system as recited in claim 5,wherein the additional criteria include receipt of a confirmation of areboot of another host computing device.
 7. The system as recited inclaim 1, wherein the state monitoring component causes initiation of areboot of the grouping of host computing devices in accordance with thereboot parameters.
 8. The system as recited in claim 1, wherein thestate monitoring component further includes at least one visualindicator interface, wherein the state monitoring component causes thegeneration of a visual indication based on the power event information.9. The system as recited in claim 1, wherein the power event informationcorresponds to a determination of no power to the grouping of hostcomputing devices.
 10. The system as recited in claim 1, wherein thepower event information corresponds to a determination of power to thegrouping of host computing devices below a threshold.
 11. The system asrecited in claim 1, wherein the state monitoring component can initiatethe establishment of the communication channel.
 12. Acomputer-implemented method for managing host computing devices, themethod comprising: under control of one or more processors configuredwith specific executable instructions for implementing a statemonitoring component and an independent power supply, obtaining eventinformation, the event information corresponding to one or moreresources utilized by a grouping of host computing devices associatedwith the state monitoring component; causing communications via acommunication channel with at least one other state monitoringcomponent, wherein the at least one other state monitoring component isassociated with a different grouping of host computing devices; if acommunication channel is successfully established, exchanging priorityinformation with the at least one other state monitoring component;determining reboot parameters for the grouping of host computingdevices, the reboot parameters based on at least one of priorityinformation associated with the grouping of host computing devices andpriority information associated with the different grouping of hostcomputing devices; and causing the processing of the determined rebootparameters.
 13. The computer-implemented method as recited in claim 12,wherein causing the instantiation of a communication channel with atleast one other state monitoring component includes causing theinstantiation of a communication channel independent of a communicationchannel associated with the grouping of host computing devices.
 14. Thecomputer-implemented method as recited in claim 12, wherein causing theinstantiation of a communication channel with at least one other statemonitoring component includes causing the instantiation of at least oneof a wired and a wireless communication channel.
 15. Thecomputer-implemented method as recited in claim 12, wherein obtainingevent information corresponds to obtaining event informationcorresponding to power event information.
 16. The computer-implementedmethod as recited in claim 15, wherein the power event informationcorresponds to a determination of no power to the grouping of hostcomputing devices.
 17. The computer-implemented method as recited inclaim 15, wherein the power event information corresponds to adetermination of power to the grouping of host computing devices below athreshold.
 18. The computer-implemented method as recited in claim 12,wherein determining reboot parameters for the grouping of host computingdevices includes determining reboot parameters based solely on priorityinformation associated with the grouping of host computing devices ifthe instantiation of a communication channel is not successful.
 19. Thecomputer-implemented method as recited in claim 12, wherein the rebootparameters correspond to a determined reboot order based on priorityinformation.
 20. The computer-implemented method as recited in claim 12,wherein the reboot parameters correspond to a delay in initiating areboot of the grouping of host computing devices.
 21. Thecomputer-implemented method as recited in claim 20, wherein the delay ininitiating the reboot of the grouping of host computing devices isbased, at least in part, on at least one of a random delay, apseudo-random delay and a fixed delay.
 22. The computer-implementedmethod as recited in claim 20, wherein the delay in initiating thereboot of the grouping of host computing devices is based, at least inpart, on evaluation of additional criteria.
 23. The computer-implementedmethod as recited in claim 12 further comprising causing theinstantiation of the communication channel in response to the eventinformation.
 24. The computer-implemented method as recited in claim 12,wherein causing the processing of the determined reboot parametersincludes causing the initiation of a reboot based on the determinedreboot parameters.
 25. The computer-implemented method as recited inclaim 12, wherein causing the processing the determined rebootparameters includes determining whether a sufficient number of hostcomputing devices are associated with the event information prior toinitiating a reboot of the host computing device.
 26. A state monitoringcomponent for managing a grouping of host computing devices associatedwith the state monitoring component, the control state monitoringcomponent including one or more processors and a power supply, the statemonitoring component comprising: a power supply independent of a powersupply associated with the grouping of host computing devices; a stateprocessing component for determining event information; and acommunication interface for communicating with other state monitoringcomponents via a communication channel independent of a communicationchannel associated with the grouping of host computing devices; whereinthe state monitoring component attempts to access a communicationchannel including at least one other state monitoring component based ona determined event, wherein the state monitoring component exchangespriority information with the at least one other state monitoringcomponent if a communication channel is successfully established, andwherein the state monitoring component determines control parameters forthe grouping of host computing devices based on the determined event.27. The system as recited in claim 26, wherein the control parameterscorrespond to a determined reboot order based on priority information.28. The system as recited in claim 26, wherein the control parameterscorrespond to the limitation of communications.
 29. The system asrecited in claim 26, wherein the control parameters correspond tomanagement of a power protocol.
 30. The system as recited in claim 26,wherein the control parameters correspond to management of a securityprotocol.
 31. The system as recited in claim 26, wherein the eventinformation corresponds to power event information.
 32. The system asrecited in claim 31, wherein the power event information corresponds toa determination of no power to the grouping of host computing devices.33. The system as recited in claim 31, wherein the power eventinformation corresponds to a determination of power to the grouping ofhost computing devices below a threshold.